Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 8154 |
| Missing cells | 24830 |
| Missing cells (%) | 20.3% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 955.7 KiB |
| Average record size in memory | 120.0 B |
Variable types
| NUM | 13 |
|---|---|
| CAT | 2 |
Reproduction
| Analysis started | 2020-08-09 16:03:19.502680 |
|---|---|
| Analysis finished | 2020-08-09 16:03:42.441136 |
| Duration | 22.94 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
state_name has a high cardinality: 55 distinct values | High cardinality |
date has a high cardinality: 190 distinct values | High cardinality |
tests_positive is highly correlated with cases | High correlation |
cases is highly correlated with tests_positive | High correlation |
tests is highly correlated with tests_negative | High correlation |
tests_negative is highly correlated with tests | High correlation |
patients_hosp is highly correlated with patients_icu and 1 other fields | High correlation |
patients_icu is highly correlated with patients_hosp and 1 other fields | High correlation |
patients_vent is highly correlated with patients_icu and 1 other fields | High correlation |
tests_positive has 260 (3.2%) missing values | Missing |
tests_negative has 373 (4.6%) missing values | Missing |
tests_pending has 7181 (88.1%) missing values | Missing |
tests has 260 (3.2%) missing values | Missing |
patients_icu has 5320 (65.2%) missing values | Missing |
patients_hosp has 2590 (31.8%) missing values | Missing |
patients_vent has 5674 (69.6%) missing values | Missing |
recovered has 3172 (38.9%) missing values | Missing |
deaths has 956 (11.7%) zeros | Zeros |
state_fips
Real number (ℝ≥0)
| Distinct count | 55 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 31.764410105469707 |
|---|---|
| Minimum | 1 |
| Maximum | 78 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 17 |
| median | 31 |
| Q3 | 46 |
| 95-th percentile | 66 |
| Maximum | 78 |
| Range | 77 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 18.56159646 |
|---|---|
| Coefficient of variation (CV) | 0.5843519963 |
| Kurtosis | -0.5244100047 |
| Mean | 31.76441011 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 0.3433304792 |
| Sum | 259007 |
| Variance | 344.532863 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 53 | 190 | 2.3% | |
| 17 | 187 | 2.3% | |
| 6 | 186 | 2.3% | |
| 4 | 185 | 2.3% | |
| 25 | 179 | 2.2% | |
| 55 | 175 | 2.1% | |
| 48 | 168 | 2.1% | |
| 31 | 163 | 2.0% | |
| 49 | 155 | 1.9% | |
| 41 | 152 | 1.9% | |
| Other values (45) | 6414 | 78.7% |
| Value | Count | Frequency (%) | |
| 1 | 138 | 1.7% | |
| 2 | 139 | 1.7% | |
| 4 | 185 | 2.3% | |
| 5 | 140 | 1.7% | |
| 6 | 186 | 2.3% |
| Value | Count | Frequency (%) | |
| 78 | 137 | 1.7% | |
| 72 | 138 | 1.7% | |
| 69 | 123 | 1.5% | |
| 66 | 136 | 1.7% | |
| 56 | 140 | 1.7% |
| Distinct count | 55 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 63.7 KiB |
| Washington | 190 |
|---|---|
| Illinois | 187 |
| California | 186 |
| Arizona | 185 |
| Massachusetts | 179 |
| Other values (50) |
| Value | Count | Frequency (%) | |
| Washington | 190 | 2.3% | |
| Illinois | 187 | 2.3% | |
| California | 186 | 2.3% | |
| Arizona | 185 | 2.3% | |
| Massachusetts | 179 | 2.2% | |
| Wisconsin | 175 | 2.1% | |
| Texas | 168 | 2.1% | |
| Nebraska | 163 | 2.0% | |
| Utah | 155 | 1.9% | |
| Oregon | 152 | 1.9% | |
| Other values (45) | 6414 | 78.7% |
Length
| Max length | 44 |
|---|---|
| Median length | 8 |
| Mean length | 9.494358597 |
| Min length | 4 |
lat
Real number (ℝ≥0)
| Distinct count | 55 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.03802439134168 |
|---|---|
| Minimum | 13.4417451 |
| Maximum | 63.347356000000005 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 13.4417451 |
|---|---|
| 5-th percentile | 18.326748 |
| Q1 | 34.8955256 |
| median | 38.9985661 |
| Q3 | 42.9896591 |
| 95-th percentile | 47.4073238 |
| Maximum | 63.347356 |
| Range | 49.9056109 |
| Interquartile range (IQR) | 8.0941335 |
Descriptive statistics
| Standard deviation | 8.339818492 |
|---|---|
| Coefficient of variation (CV) | 0.2192495174 |
| Kurtosis | 2.31621715 |
| Mean | 38.03802439 |
| Median Absolute Deviation (MAD) | 4.1030405 |
| Skewness | -0.8039084413 |
| Sum | 310162.0509 |
| Variance | 69.55257248 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 47.4073238 | 190 | 2.3% | |
| 40.1028754 | 187 | 2.3% | |
| 37.1551773 | 186 | 2.3% | |
| 34.2039355 | 185 | 2.3% | |
| 42.1565196 | 179 | 2.2% | |
| 44.6309071 | 175 | 2.1% | |
| 31.4347032 | 168 | 2.1% | |
| 41.5433053 | 163 | 2.0% | |
| 39.3349925 | 155 | 1.9% | |
| 43.9717125 | 152 | 1.9% | |
| Other values (45) | 6414 | 78.7% |
| Value | Count | Frequency (%) | |
| 13.4417451 | 136 | 1.7% | |
| 14.9367835 | 123 | 1.5% | |
| 18.217648 | 138 | 1.7% | |
| 18.326748 | 137 | 1.7% | |
| 19.5977643 | 145 | 1.8% |
| Value | Count | Frequency (%) | |
| 63.347356 | 139 | 1.7% | |
| 47.442174 | 140 | 1.7% | |
| 47.4073238 | 190 | 2.3% | |
| 47.0511771 | 138 | 1.7% | |
| 46.3159573 | 145 | 1.8% |
long
Real number (ℝ)
| Distinct count | 55 |
|---|---|
| Unique (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -85.15719407553347 |
|---|---|
| Minimum | -155.5024434 |
| Maximum | 145.601021 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | -155.5024434 |
|---|---|
| 5-th percentile | -120.6229578 |
| Q1 | -100.4608258 |
| median | -89.1526108 |
| Q3 | -77.0165167 |
| 95-th percentile | -66.4107992 |
| Maximum | 145.601021 |
| Range | 301.1034644 |
| Interquartile range (IQR) | 23.4443091 |
Descriptive statistics
| Standard deviation | 45.82399885 |
|---|---|
| Coefficient of variation (CV) | -0.5381107181 |
| Kurtosis | 17.51004852 |
| Mean | -85.15719408 |
| Median Absolute Deviation (MAD) | 12.1360941 |
| Skewness | 3.863192849 |
| Sum | -694371.7605 |
| Variance | 2099.838871 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| -120.5757999 | 190 | 2.3% | |
| -89.1526108 | 187 | 2.3% | |
| -119.5434183 | 186 | 2.3% | |
| -111.6063565 | 185 | 2.3% | |
| -71.4895915 | 179 | 2.2% | |
| -89.7093916 | 175 | 2.1% | |
| -99.2818238 | 168 | 2.1% | |
| -99.8118646 | 163 | 2.0% | |
| -111.6563326 | 155 | 1.9% | |
| -120.6229578 | 152 | 1.9% | |
| Other values (45) | 6414 | 78.7% |
| Value | Count | Frequency (%) | |
| -155.5024434 | 145 | 1.8% | |
| -152.8397334 | 139 | 1.7% | |
| -120.6229578 | 152 | 1.9% | |
| -120.5757999 | 190 | 2.3% | |
| -119.5434183 | 186 | 2.3% |
| Value | Count | Frequency (%) | |
| 145.601021 | 123 | 1.5% | |
| 144.7719021 | 136 | 1.7% | |
| -64.9712508 | 137 | 1.7% | |
| -66.4107992 | 138 | 1.7% | |
| -68.666616 | 139 | 1.7% |
| Distinct count | 190 |
|---|---|
| Unique (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 63.7 KiB |
| 6/12/2020 | 55 |
|---|---|
| 7/27/2020 | 55 |
| 4/9/2020 | 55 |
| 7/14/2020 | 55 |
| 4/4/2020 | 55 |
| Other values (185) |
| Value | Count | Frequency (%) | |
| 6/12/2020 | 55 | 0.7% | |
| 7/27/2020 | 55 | 0.7% | |
| 4/9/2020 | 55 | 0.7% | |
| 7/14/2020 | 55 | 0.7% | |
| 4/4/2020 | 55 | 0.7% | |
| 4/6/2020 | 55 | 0.7% | |
| 7/2/2020 | 55 | 0.7% | |
| 4/30/2020 | 55 | 0.7% | |
| 7/9/2020 | 55 | 0.7% | |
| 5/6/2020 | 55 | 0.7% | |
| Other values (180) | 7604 | 93.3% |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.724675006 |
| Min length | 8 |
| Distinct count | 6097 |
|---|---|
| Unique (%) | 74.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28152.917709099827 |
|---|---|
| Minimum | 1 |
| Maximum | 474951 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 716.25 |
| median | 6169 |
| Q3 | 28682.5 |
| 95-th percentile | 123787 |
| Maximum | 474951 |
| Range | 474950 |
| Interquartile range (IQR) | 27966.25 |
Descriptive statistics
| Standard deviation | 59724.32873 |
|---|---|
| Coefficient of variation (CV) | 2.121425898 |
| Kurtosis | 20.47032086 |
| Mean | 28152.91771 |
| Median Absolute Deviation (MAD) | 6137 |
| Skewness | 4.182934106 |
| Sum | 229558891 |
| Variance | 3566995443 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 216 | 2.6% | |
| 2 | 113 | 1.4% | |
| 3 | 38 | 0.5% | |
| 6 | 35 | 0.4% | |
| 30 | 34 | 0.4% | |
| 14 | 34 | 0.4% | |
| 5 | 32 | 0.4% | |
| 4 | 32 | 0.4% | |
| 11 | 30 | 0.4% | |
| 7 | 29 | 0.4% | |
| Other values (6087) | 7561 | 92.7% |
| Value | Count | Frequency (%) | |
| 1 | 216 | 2.6% | |
| 2 | 113 | 1.4% | |
| 3 | 38 | 0.5% | |
| 4 | 32 | 0.4% | |
| 5 | 32 | 0.4% |
| Value | Count | Frequency (%) | |
| 474951 | 1 | < 0.1% | |
| 467103 | 1 | < 0.1% | |
| 459338 | 1 | < 0.1% | |
| 453327 | 1 | < 0.1% | |
| 443096 | 1 | < 0.1% |
| Distinct count | 2572 |
|---|---|
| Unique (%) | 31.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1355.8254844248222 |
|---|---|
| Minimum | 0 |
| Maximum | 32333 |
| Zeros | 956 |
| Zeros (%) | 11.7% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 12 |
| median | 174 |
| Q3 | 988.75 |
| 95-th percentile | 6305.05 |
| Maximum | 32333 |
| Range | 32333 |
| Interquartile range (IQR) | 976.75 |
Descriptive statistics
| Standard deviation | 3730.562247 |
|---|---|
| Coefficient of variation (CV) | 2.751506215 |
| Kurtosis | 39.03443716 |
| Mean | 1355.825484 |
| Median Absolute Deviation (MAD) | 174 |
| Skewness | 5.771885709 |
| Sum | 11055401 |
| Variance | 13917094.68 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 956 | 11.7% | |
| 6 | 233 | 2.9% | |
| 2 | 210 | 2.6% | |
| 1 | 163 | 2.0% | |
| 17 | 93 | 1.1% | |
| 7 | 86 | 1.1% | |
| 3 | 78 | 1.0% | |
| 4 | 63 | 0.8% | |
| 8 | 62 | 0.8% | |
| 16 | 58 | 0.7% | |
| Other values (2562) | 6152 | 75.4% |
| Value | Count | Frequency (%) | |
| 0 | 956 | 11.7% | |
| 1 | 163 | 2.0% | |
| 2 | 210 | 2.6% | |
| 3 | 78 | 1.0% | |
| 4 | 63 | 0.8% |
| Value | Count | Frequency (%) | |
| 32333 | 1 | < 0.1% | |
| 32322 | 1 | < 0.1% | |
| 32305 | 1 | < 0.1% | |
| 32295 | 1 | < 0.1% | |
| 32278 | 1 | < 0.1% |
| Distinct count | 5916 |
|---|---|
| Unique (%) | 74.9% |
| Missing | 260 |
| Missing (%) | 3.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28925.35596655688 |
|---|---|
| Minimum | 0.0 |
| Maximum | 466550.0 |
| Zeros | 19 |
| Zeros (%) | 0.2% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 829.25 |
| median | 6948 |
| Q3 | 30027.75 |
| 95-th percentile | 128459.05 |
| Maximum | 466550 |
| Range | 466550 |
| Interquartile range (IQR) | 29198.5 |
Descriptive statistics
| Standard deviation | 59715.62184 |
|---|---|
| Coefficient of variation (CV) | 2.064473188 |
| Kurtosis | 19.5677898 |
| Mean | 28925.35597 |
| Median Absolute Deviation (MAD) | 6873.5 |
| Skewness | 4.088455613 |
| Sum | 228336760 |
| Variance | 3565955491 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 73 | 0.9% | |
| 2 | 59 | 0.7% | |
| 3 | 37 | 0.5% | |
| 30 | 36 | 0.4% | |
| 6 | 32 | 0.4% | |
| 14 | 31 | 0.4% | |
| 5 | 27 | 0.3% | |
| 69 | 26 | 0.3% | |
| 22 | 24 | 0.3% | |
| 8 | 24 | 0.3% | |
| Other values (5906) | 7525 | 92.3% | |
| (Missing) | 260 | 3.2% |
| Value | Count | Frequency (%) | |
| 0 | 19 | 0.2% | |
| 1 | 73 | 0.9% | |
| 2 | 59 | 0.7% | |
| 3 | 37 | 0.5% | |
| 4 | 18 | 0.2% |
| Value | Count | Frequency (%) | |
| 466550 | 1 | < 0.1% | |
| 460550 | 1 | < 0.1% | |
| 453659 | 1 | < 0.1% | |
| 445400 | 1 | < 0.1% | |
| 441977 | 1 | < 0.1% |
| Distinct count | 6689 |
|---|---|
| Unique (%) | 86.0% |
| Missing | 373 |
| Missing (%) | 4.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 277888.0647731654 |
|---|---|
| Minimum | 0.0 |
| Maximum | 6951316.0 |
| Zeros | 38 |
| Zeros (%) | 0.5% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 131 |
| Q1 | 15215 |
| median | 84592 |
| Q3 | 292317 |
| 95-th percentile | 1113717 |
| Maximum | 6951316 |
| Range | 6951316 |
| Interquartile range (IQR) | 277102 |
Descriptive statistics
| Standard deviation | 586718.6365 |
|---|---|
| Coefficient of variation (CV) | 2.111348816 |
| Kurtosis | 37.33805232 |
| Mean | 277888.0648 |
| Median Absolute Deviation (MAD) | 81116 |
| Skewness | 5.287544843 |
| Sum | 2162247032 |
| Variance | 3.442387584e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 38 | 0.5% | |
| 36 | 25 | 0.3% | |
| 94 | 20 | 0.2% | |
| 113280 | 17 | 0.2% | |
| 8187 | 16 | 0.2% | |
| 255766 | 15 | 0.2% | |
| 22092 | 14 | 0.2% | |
| 6088 | 13 | 0.2% | |
| 9313 | 13 | 0.2% | |
| 27 | 11 | 0.1% | |
| Other values (6679) | 7599 | 93.2% | |
| (Missing) | 373 | 4.6% |
| Value | Count | Frequency (%) | |
| 0 | 38 | 0.5% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 5 | 3 | < 0.1% | |
| 7 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6951316 | 1 | < 0.1% | |
| 6836028 | 1 | < 0.1% | |
| 6714480 | 1 | < 0.1% | |
| 6601955 | 1 | < 0.1% | |
| 6480542 | 1 | < 0.1% |
| Distinct count | 486 |
|---|---|
| Unique (%) | 49.9% |
| Missing | 7181 |
| Missing (%) | 88.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1225.813977389517 |
|---|---|
| Minimum | 0.0 |
| Maximum | 64400.0 |
| Zeros | 29 |
| Zeros (%) | 0.4% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 24 |
| median | 158 |
| Q3 | 518 |
| 95-th percentile | 1944.4 |
| Maximum | 64400 |
| Range | 64400 |
| Interquartile range (IQR) | 494 |
Descriptive statistics
| Standard deviation | 6167.91771 |
|---|---|
| Coefficient of variation (CV) | 5.031691451 |
| Kurtosis | 80.01360356 |
| Mean | 1225.813977 |
| Median Absolute Deviation (MAD) | 149 |
| Skewness | 8.767773285 |
| Sum | 1192717 |
| Variance | 38043208.88 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 29 | 0.4% | |
| 2 | 23 | 0.3% | |
| 6 | 18 | 0.2% | |
| 5 | 15 | 0.2% | |
| 20 | 15 | 0.2% | |
| 4 | 15 | 0.2% | |
| 1 | 13 | 0.2% | |
| 12 | 10 | 0.1% | |
| 35 | 10 | 0.1% | |
| 7 | 10 | 0.1% | |
| Other values (476) | 815 | 10.0% | |
| (Missing) | 7181 | 88.1% |
| Value | Count | Frequency (%) | |
| 0 | 29 | 0.4% | |
| 1 | 13 | 0.2% | |
| 2 | 23 | 0.3% | |
| 3 | 9 | 0.1% | |
| 4 | 15 | 0.2% |
| Value | Count | Frequency (%) | |
| 64400 | 3 | < 0.1% | |
| 59500 | 2 | < 0.1% | |
| 57400 | 4 | < 0.1% | |
| 48600 | 1 | < 0.1% | |
| 15000 | 2 | < 0.1% |
| Distinct count | 7146 |
|---|---|
| Unique (%) | 90.5% |
| Missing | 260 |
| Missing (%) | 3.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 302835.54497086396 |
|---|---|
| Minimum | 0.0 |
| Maximum | 7417866.0 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 111.65 |
| Q1 | 14927 |
| median | 90510 |
| Q3 | 317251 |
| 95-th percentile | 1219437.55 |
| Maximum | 7417866 |
| Range | 7417866 |
| Interquartile range (IQR) | 302324 |
Descriptive statistics
| Standard deviation | 636042.9069 |
|---|---|
| Coefficient of variation (CV) | 2.100291453 |
| Kurtosis | 35.49400446 |
| Mean | 302835.545 |
| Median Absolute Deviation (MAD) | 87740.5 |
| Skewness | 5.164044338 |
| Sum | 2390583792 |
| Variance | 4.045505794e+11 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2 | 33 | 0.4% | |
| 1 | 24 | 0.3% | |
| 65 | 11 | 0.1% | |
| 17 | 11 | 0.1% | |
| 8217 | 10 | 0.1% | |
| 12 | 10 | 0.1% | |
| 38 | 9 | 0.1% | |
| 18 | 9 | 0.1% | |
| 8169 | 9 | 0.1% | |
| 7 | 9 | 0.1% | |
| Other values (7136) | 7759 | 95.2% | |
| (Missing) | 260 | 3.2% |
| Value | Count | Frequency (%) | |
| 0 | 2 | < 0.1% | |
| 1 | 24 | 0.3% | |
| 2 | 33 | 0.4% | |
| 3 | 9 | 0.1% | |
| 4 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 7417866 | 1 | < 0.1% | |
| 7296578 | 1 | < 0.1% | |
| 7168139 | 1 | < 0.1% | |
| 7047355 | 1 | < 0.1% | |
| 6915876 | 1 | < 0.1% |
| Distinct count | 864 |
|---|---|
| Unique (%) | 30.5% |
| Missing | 5320 |
| Missing (%) | 65.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 378.5363443895554 |
|---|---|
| Minimum | 0.0 |
| Maximum | 5225.0 |
| Zeros | 9 |
| Zeros (%) | 0.1% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 63 |
| median | 150 |
| Q3 | 362 |
| 95-th percentile | 1502 |
| Maximum | 5225 |
| Range | 5225 |
| Interquartile range (IQR) | 299 |
Descriptive statistics
| Standard deviation | 653.4553289 |
|---|---|
| Coefficient of variation (CV) | 1.726268398 |
| Kurtosis | 19.69100206 |
| Mean | 378.5363444 |
| Median Absolute Deviation (MAD) | 117 |
| Skewness | 3.920168781 |
| Sum | 1072772 |
| Variance | 427003.8668 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 9 | 49 | 0.6% | |
| 8 | 32 | 0.4% | |
| 2 | 28 | 0.3% | |
| 11 | 26 | 0.3% | |
| 10 | 25 | 0.3% | |
| 18 | 24 | 0.3% | |
| 13 | 22 | 0.3% | |
| 7 | 20 | 0.2% | |
| 14 | 19 | 0.2% | |
| 12 | 18 | 0.2% | |
| Other values (854) | 2571 | 31.5% | |
| (Missing) | 5320 | 65.2% |
| Value | Count | Frequency (%) | |
| 0 | 9 | 0.1% | |
| 2 | 28 | 0.3% | |
| 3 | 6 | 0.1% | |
| 4 | 12 | 0.1% | |
| 5 | 14 | 0.2% |
| Value | Count | Frequency (%) | |
| 5225 | 1 | < 0.1% | |
| 5205 | 1 | < 0.1% | |
| 5198 | 1 | < 0.1% | |
| 5156 | 1 | < 0.1% | |
| 5071 | 1 | < 0.1% |
| Distinct count | 1914 |
|---|---|
| Unique (%) | 34.4% |
| Missing | 2590 |
| Missing (%) | 31.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 960.9338605319914 |
|---|---|
| Minimum | 0.0 |
| Maximum | 18825.0 |
| Zeros | 14 |
| Zeros (%) | 0.2% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 13 |
| Q1 | 95 |
| median | 380 |
| Q3 | 924.5 |
| 95-th percentile | 4322.4 |
| Maximum | 18825 |
| Range | 18825 |
| Interquartile range (IQR) | 829.5 |
Descriptive statistics
| Standard deviation | 1875.26954 |
|---|---|
| Coefficient of variation (CV) | 1.951507401 |
| Kurtosis | 28.85531246 |
| Mean | 960.9338605 |
| Median Absolute Deviation (MAD) | 326 |
| Skewness | 4.691128774 |
| Sum | 5346636 |
| Variance | 3516635.849 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 13 | 41 | 0.5% | |
| 19 | 34 | 0.4% | |
| 14 | 34 | 0.4% | |
| 12 | 34 | 0.4% | |
| 3 | 32 | 0.4% | |
| 29 | 28 | 0.3% | |
| 8 | 28 | 0.3% | |
| 10 | 27 | 0.3% | |
| 16 | 27 | 0.3% | |
| 24 | 27 | 0.3% | |
| Other values (1904) | 5252 | 64.4% | |
| (Missing) | 2590 | 31.8% |
| Value | Count | Frequency (%) | |
| 0 | 14 | 0.2% | |
| 1 | 13 | 0.2% | |
| 2 | 14 | 0.2% | |
| 3 | 32 | 0.4% | |
| 4 | 17 | 0.2% |
| Value | Count | Frequency (%) | |
| 18825 | 1 | < 0.1% | |
| 18707 | 1 | < 0.1% | |
| 18697 | 1 | < 0.1% | |
| 18654 | 1 | < 0.1% | |
| 18569 | 1 | < 0.1% |
| Distinct count | 560 |
|---|---|
| Unique (%) | 22.6% |
| Missing | 5674 |
| Missing (%) | 69.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 179.76008064516128 |
|---|---|
| Minimum | 0.0 |
| Maximum | 2425.0 |
| Zeros | 26 |
| Zeros (%) | 0.3% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 26 |
| median | 85 |
| Q3 | 192 |
| 95-th percentile | 739.05 |
| Maximum | 2425 |
| Range | 2425 |
| Interquartile range (IQR) | 166 |
Descriptive statistics
| Standard deviation | 285.304029 |
|---|---|
| Coefficient of variation (CV) | 1.587137856 |
| Kurtosis | 13.70653208 |
| Mean | 179.7600806 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | 3.336826306 |
| Sum | 445805 |
| Variance | 81398.38897 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2 | 45 | 0.6% | |
| 4 | 45 | 0.6% | |
| 7 | 43 | 0.5% | |
| 5 | 38 | 0.5% | |
| 3 | 37 | 0.5% | |
| 1 | 31 | 0.4% | |
| 6 | 30 | 0.4% | |
| 0 | 26 | 0.3% | |
| 14 | 23 | 0.3% | |
| 17 | 22 | 0.3% | |
| Other values (550) | 2140 | 26.2% | |
| (Missing) | 5674 | 69.6% |
| Value | Count | Frequency (%) | |
| 0 | 26 | 0.3% | |
| 1 | 31 | 0.4% | |
| 2 | 45 | 0.6% | |
| 3 | 37 | 0.5% | |
| 4 | 45 | 0.6% |
| Value | Count | Frequency (%) | |
| 2425 | 1 | < 0.1% | |
| 2295 | 1 | < 0.1% | |
| 2203 | 1 | < 0.1% | |
| 2073 | 1 | < 0.1% | |
| 2020 | 1 | < 0.1% |
| Distinct count | 3057 |
|---|---|
| Unique (%) | 61.4% |
| Missing | 3172 |
| Missing (%) | 38.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12244.892814130872 |
|---|---|
| Minimum | 2.0 |
| Maximum | 244449.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 49 |
| Q1 | 686 |
| median | 3237 |
| Q3 | 13363.5 |
| 95-th percentile | 59187.8 |
| Maximum | 244449 |
| Range | 244447 |
| Interquartile range (IQR) | 12677.5 |
Descriptive statistics
| Standard deviation | 21744.18362 |
|---|---|
| Coefficient of variation (CV) | 1.775775742 |
| Kurtosis | 18.62266767 |
| Mean | 12244.89281 |
| Median Absolute Deviation (MAD) | 2970.5 |
| Skewness | 3.464229693 |
| Sum | 61004056 |
| Variance | 472809521.3 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 19 | 40 | 0.5% | |
| 15 | 26 | 0.3% | |
| 12 | 23 | 0.3% | |
| 61 | 20 | 0.2% | |
| 29 | 18 | 0.2% | |
| 15642 | 18 | 0.2% | |
| 64 | 16 | 0.2% | |
| 35 | 14 | 0.2% | |
| 51 | 13 | 0.2% | |
| 93157 | 13 | 0.2% | |
| Other values (3047) | 4781 | 58.6% | |
| (Missing) | 3172 | 38.9% |
| Value | Count | Frequency (%) | |
| 2 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 7 | 6 | 0.1% | |
| 9 | 7 | 0.1% | |
| 11 | 9 | 0.1% |
| Value | Count | Frequency (%) | |
| 244449 | 1 | < 0.1% | |
| 229107 | 2 | < 0.1% | |
| 221510 | 1 | < 0.1% | |
| 212216 | 1 | < 0.1% | |
| 203826 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| state_fips | state_name | lat | long | date | cases | deaths | tests_positive | tests_negative | tests_pending | tests | patients_icu | patients_hosp | patients_vent | recovered | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 36 | New York | 42.913397 | -75.596272 | 7/1/2020 | 398770 | 31791 | 394079.0 | 3577569.0 | NaN | 3971648.0 | 226.0 | 879.0 | 139.0 | 70590.0 |
| 1 | 36 | New York | 42.913397 | -75.596272 | 7/2/2020 | 399642 | 31814 | 394954.0 | 3646639.0 | NaN | 4041593.0 | 209.0 | 878.0 | 129.0 | 70698.0 |
| 2 | 36 | New York | 42.913397 | -75.596272 | 7/3/2020 | 400561 | 31836 | 395872.0 | 3712113.0 | NaN | 4107985.0 | 188.0 | 857.0 | 125.0 | 70794.0 |
| 3 | 36 | New York | 42.913397 | -75.596272 | 7/4/2020 | 401286 | 31860 | 396598.0 | 3773790.0 | NaN | 4170388.0 | 190.0 | 844.0 | 119.0 | 70877.0 |
| 4 | 36 | New York | 42.913397 | -75.596272 | 7/5/2020 | 401822 | 31895 | 397131.0 | 3836672.0 | NaN | 4233803.0 | 178.0 | 832.0 | 116.0 | 70968.0 |
| 5 | 36 | New York | 42.913397 | -75.596272 | 7/6/2020 | 402338 | 31911 | 397649.0 | 3890482.0 | NaN | 4288131.0 | 170.0 | 817.0 | 103.0 | 71040.0 |
| 6 | 36 | New York | 42.913397 | -75.596272 | 7/7/2020 | 402928 | 31934 | 398237.0 | 3946630.0 | NaN | 4344867.0 | 160.0 | 836.0 | 103.0 | 71091.0 |
| 7 | 36 | New York | 42.913397 | -75.596272 | 7/8/2020 | 403619 | 31945 | 398929.0 | 4003523.0 | NaN | 4402452.0 | 166.0 | 841.0 | 97.0 | 71185.0 |
| 8 | 36 | New York | 42.913397 | -75.596272 | 7/9/2020 | 404207 | 31979 | 399513.0 | 4068503.0 | NaN | 4468016.0 | 173.0 | 851.0 | 98.0 | 71279.0 |
| 9 | 36 | New York | 42.913397 | -75.596272 | 7/10/2020 | 404997 | 32004 | 400299.0 | 4141275.0 | NaN | 4541574.0 | 178.0 | 826.0 | 92.0 | 71371.0 |
Last rows
| state_fips | state_name | lat | long | date | cases | deaths | tests_positive | tests_negative | tests_pending | tests | patients_icu | patients_hosp | patients_vent | recovered | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8144 | 36 | New York | 42.913397 | -75.596272 | 6/21/2020 | 392702 | 30884 | 387936.0 | 3007383.0 | NaN | 3395319.0 | 332.0 | 1142.0 | 237.0 | 69506.0 |
| 8145 | 36 | New York | 42.913397 | -75.596272 | 6/22/2020 | 393257 | 30934 | 388488.0 | 3063611.0 | NaN | 3452099.0 | 330.0 | 1122.0 | 228.0 | 69710.0 |
| 8146 | 36 | New York | 42.913397 | -75.596272 | 6/23/2020 | 393855 | 30970 | 389085.0 | 3111723.0 | NaN | 3500808.0 | 302.0 | 1104.0 | 228.0 | 69710.0 |
| 8147 | 36 | New York | 42.913397 | -75.596272 | 6/24/2020 | 394430 | 31001 | 389666.0 | 3162286.0 | NaN | 3551952.0 | 290.0 | 1071.0 | 228.0 | 69710.0 |
| 8148 | 36 | New York | 42.913397 | -75.596272 | 6/25/2020 | 395168 | 31029 | 390415.0 | 3229179.0 | NaN | 3619594.0 | 270.0 | 996.0 | 167.0 | 70010.0 |
| 8149 | 36 | New York | 42.913397 | -75.596272 | 6/26/2020 | 395972 | 31075 | 391220.0 | 3290097.0 | NaN | 3681317.0 | 244.0 | 951.0 | 167.0 | 70010.0 |
| 8150 | 36 | New York | 42.913397 | -75.596272 | 6/27/2020 | 396669 | 31105 | 391923.0 | 3362656.0 | NaN | 3754579.0 | 230.0 | 908.0 | 167.0 | 70010.0 |
| 8151 | 36 | New York | 42.913397 | -75.596272 | 6/28/2020 | 397293 | 31137 | 392539.0 | 3423946.0 | NaN | 3816485.0 | 229.0 | 869.0 | 167.0 | 70010.0 |
| 8152 | 36 | New York | 42.913397 | -75.596272 | 6/29/2020 | 397684 | 31143 | 392930.0 | 3469983.0 | NaN | 3862913.0 | 216.0 | 853.0 | 136.0 | 70435.0 |
| 8153 | 36 | New York | 42.913397 | -75.596272 | 6/30/2020 | 398142 | 31776 | 393454.0 | 3521484.0 | NaN | 3914938.0 | 217.0 | 891.0 | 137.0 | 70487.0 |